Learning Spatio-temporal Features with Partial Expression Sequences for on-the-Fly Prediction

نویسندگان

  • Wissam J. Baddar
  • Yong Man Ro
چکیده

Spatio-temporal feature encoding is essential for encoding facial expression dynamics in video sequences. At test time, most spatio-temporal encoding methods assume that a temporally segmented sequence is fed to a learned model, which could require the prediction to wait until the full sequence is available to an auxiliary task that performs the temporal segmentation. This causes a delay in predicting the expression. In an interactive setting, such as affective interactive agents, such delay in the prediction could not be tolerated. Therefore, training a model that can accurately predict the facial expression ”on-the-fly” (as they are fed to the system) is essential. In this paper, we propose a new spatio-temporal feature learning method, which would allow prediction with partial sequences. As such, the prediction could be performed on-thefly. The proposed method utilizes an estimated expression intensity to generate dense labels, which are used to regulate the prediction model training with a novel objective function. As results, the learned spatio-temporal features can robustly predict the expression with partial (incomplete) expression sequences, on-the-fly. Experimental results showed that the proposed method achieved higher recognition rates compared to the state-of-the-art methods on both datasets. More importantly, the results verified that the proposed method improved the prediction frames with partial expression sequence inputs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

معرفی شبکه های عصبی پیمانه ای عمیق با ساختار فضایی-زمانی دوگانه جهت بهبود بازشناسی گفتار پیوسته فارسی

In this article, growable deep modular neural networks for continuous speech recognition are introduced. These networks can be grown to implement the spatio-temporal information of the frame sequences at their input layer as well as their labels at the output layer at the same time. The trained neural network with such double spatio-temporal association structure can learn the phonetic sequence...

متن کامل

Recognition of Visual Events using Spatio-Temporal Information of the Video Signal

Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...

متن کامل

Modeling of the Relationships Between Spatio-Temporal Changes of Traffic Volume and Particulate Matter-2.5 Pollutant Concentration Based on Geographically Weighted Regression (GWR) and Inverse Distance Weighting (IDW) Model: A Case Study in Tehran M

Background and Aim: High concentrations of particulate matter-25 (PM2.5) have been the cause of the unhealthiest days in Tehran, Iran in recent years. This study was conducted with the aim of the spatio-temporal analysis of traffic volume and its relationship with PM2.5 pollutant concentrations in Tehran metropolis, Tehran during 2015-2018, using the Geographic Information System (GIS). Materi...

متن کامل

Analysis of temporal transcription expression profiles reveal links between protein function and developmental stages of Drosophila melanogaster

Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal tran...

متن کامل

A New Wavelet Based Spatio-temporal Method for Magnification of Subtle Motions in Video

Video magnification is a computational procedure to reveal subtle variations during video frames that are invisible to the naked eye. A new spatio-temporal method which makes use of connectivity based mapping of the wavelet sub-bands is introduced here for exaggerating of small motions during video frames. In this method, firstly the wavelet transformed frames are mapped to connectivity space a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.10914  شماره 

صفحات  -

تاریخ انتشار 2017